Skip to content

Feature/multiserver plugin#3421

Open
Muddyblack wants to merge 4 commits into
ipspace:devfrom
Muddyblack:feature/multiserver-plugin
Open

Feature/multiserver plugin#3421
Muddyblack wants to merge 4 commits into
ipspace:devfrom
Muddyblack:feature/multiserver-plugin

Conversation

@Muddyblack
Copy link
Copy Markdown
Collaborator

Reference: #3420

Summary

This PR adds the multiserver plugin to distribute a single Netlab topology across multiple physical servers.
Sadly for now containerlab-provider only.

Key Details

  • Self-contained: The implementation is entirely within netsim/extra/multiserver/ and doesn't modify any core Netlab engine logic.
  • Consistent Allocations: IP, interface, and VNI allocations are computed once on the workstation. The plugin then generates a per-server directory with a filtered clab.yml and netlab.snapshot.pickle.
  • Native Remote Deployments: Remote servers launch using standard sudo netlab up --snapshot -vv without needing custom CLI options.

For the test-files I am not sure if they make any sense. But they show at least it does not interfere with the normal netlab workflow.

Explanations on how it works can be found in docs/plugins/multiserver.md.


```{warning}
* The *multiserver* plugin requires the **containerlab** provider on all servers.
* Containerlab version >= `0.46` is required for native VXLAN link endpoint support.
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a thought: the plugin probably does not work with old releases, and we're enforcing 0.75.0 right now.

Copy link
Copy Markdown
Collaborator Author

@Muddyblack Muddyblack May 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you mean?

Well I only tried it on 0.75 true but it should work before too. It only requires a netlab version which uses the pickel system for its storage.

Netlab already enforces a higher version than 0.46 i think no?

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Netlab already enforces a higher version than 0.46 i think no?

That was the point. You don't have to mention containerlab version.

Copy link
Copy Markdown
Owner

@ipspace ipspace left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Super-awesome-job!!! Thanks a million.

Tons of comments (as you expected ;). Some of them are just suggestions or pointers to existing helper functions, in other cases I think we can make the whole thing a lot more streamlined with significant rewrites.

(multiserver-servers)=
### Server Parameters

Each entry in the **multiserver.servers** list supports these parameters:
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make more sense to have a dictionary of servers?


| Parameter | Type | Meaning |
|-----------|------|---------|
| **id** | integer | Unique identifier for the server (e.g. `1`, `2`) |
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ID could be assigned automatically, like we do for nodes. We even have a set of functions (modules/_dataplane) to handle IDs where some objects have a static ID and others need auto-assigned ones -- used for VLANs, VRFs, and the like

In `auto` mode, nodes that are not explicitly pinned to a server are distributed automatically using a greedy balancing algorithm:

1. Nodes belonging to a *netlab* group are kept together — the entire group is placed on the server that currently has the fewest nodes. Larger groups are placed first for better balance.
2. Remaining ungrouped nodes are assigned one at a time to the least-loaded server.
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Haven't looked at the code yet, but I'm guessing that "loaded" means "number of nodes", not something more complex like CPU/RAM requirements of lab devices? If that's the case, it might be worth spelling it out.


### Automatic Assignment

In `auto` mode, nodes that are not explicitly pinned to a server are distributed automatically using a greedy balancing algorithm:
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In some future version, you might want to add server capabilities for weighted distribution ;)


The plugin automatically copies all required files into each server directory — no extra bundling step is needed.

**Step 2: Copy server directories to remote hosts** (e.g. via rsync):
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might want to automate that in the future, but this is definitely more than good enough for version 1


def _intf_clab_name(intf: Box) -> str:
"""Containerlab interface name for a node interface."""
return intf.get("clab", {}).get("name", "") or intf.get("ifname", "")
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

intf.get("clab.name","") or intf.get("ifname","")

See "Using Box objects" in AGENTS.md

return intf.get("clab", {}).get("name", "") or intf.get("ifname", "")


def _build_clab_node(nname: str, ndata: Box, topology: Box) -> dict:
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like this whole part could be easier to do with a topology filter (something similar to "remove unmanaged nodes") followed by an augmented clab.yml Jinja2 template. I also wouldn't have a problem adding VXLAN support (as clab attributes) into netlab core and then using those attributes here.


# ===========================================================================
# Internal helpers — clab.yml generation
# ===========================================================================
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This whole section should be completely restructured. We're reinventing the wheel (in Python)

topo_copy.links = [l for l in topo_copy.links if any(i.node in local_nodes for i in l.get("interfaces", []))]

# Expand paths (add f_files / f_tasks / f_dirs computed keys).
make_paths_absolute(topo_copy.defaults.paths)
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm afraid there are lots of assumptions here, starting with "the directory structure MUST be the same on all the servers and the control node"

pickle.dump(topodict, f)


def _write_vxlan_scripts(out_dir: str, tunnels: list, dev: str) -> None:
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason why you wouldn't use a Jinja2 template for this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Proposal: multiserver plugin to easily split topologies across physical servers

2 participants